In the following program, we would guide you through using Pandas to process the emission data for Tensorflow Machine Learning. Then we would teach you how to create and train your Tensorflow model. Answer the questions when you see Q; follow the steps in To-do. When you see something like $^{D1}$ or $^{M1}$ next to problems, you should refer to the rubrics to see how the the problems will be graded as those problems are worth points.
Note: Hit the "Run" button to run the program block by block. We don't recommend you to use "Run All" in "Cell" because the first few blocks only need to be run once and they take some time to run.
The following block is used in Python to import necessary libraries. You might encounter error while trying to import tensorflow. This is becuase Tensorflow is not a default library that comes with the Python package you installed. Go to this link https://www.tensorflow.org/install/pip#system-install and follow the instructions on installing Tensorflow. If you encounter problems while trying to install Tensorflow you can add --user after pip install. This is because you did not create a virtual environment for your python packages. You can follow Step 2 on the website to create a virtual environment (recommended) or you can just install the package in your HOME environment. You might encounter error while trying to import other libraries. Please use the same pip method described above.
pandas is used to process our data.
numpy is a great tool for mathematical processing and array creations.
sklearn is used to split the data into Training, Testing, and Validation set.
# Import Libraries
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split
import seaborn as sns
from matplotlib import pyplot as plt
2021-11-20 21:44:59.136142: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory 2021-11-20 21:44:59.136249: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
# Load the TensorBoard notebook extension.
%load_ext tensorboard
from datetime import datetime
from packaging import version
print("TensorFlow version: ", tf.__version__)
assert version.parse(tf.__version__).release[0] >= 2, \
"This notebook requires TensorFlow 2.0 or above."
import tensorboard
tensorboard.__version__
TensorFlow version: 2.7.0
'2.7.0'
import random
random.seed(20211120)
tf.random.set_seed(20211120)
np.random.seed(20211120)
To process the data, save the .csv file you downloaded from the Google Drive to the same directory where this Notebook is at.
pd.read_csv("file path") reads the data into emission_train
pd directly becuase we import pandas as pd.head() returns the first 100 rows of data. Note that when displaying, some rows are truncated. It is normal since the rows are too long.
.describe() shows statistical data for our data frame.
# loading the large data set, it may takes a while.
emission_train = pd.read_csv("emission.csv", delimiter=",", quoting = 3)
Here is a link that contains information about meaning of the columns in "emission.csv": https://sumo.dlr.de/docs/Simulation/Output/EmissionOutput.html
display(emission_train.head(100))
display(emission_train.describe())
| timestep_time | vehicle_CO | vehicle_CO2 | vehicle_HC | vehicle_NOx | vehicle_PMx | vehicle_angle | vehicle_eclass | vehicle_electricity | vehicle_fuel | vehicle_id | vehicle_lane | vehicle_noise | vehicle_pos | vehicle_route | vehicle_speed | vehicle_type | vehicle_waiting | vehicle_x | vehicle_y | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.0 | 15.20 | 7380.56 | 0.00 | 84.89 | 2.21 | 50.28 | HBEFA3/HDV | 0.0 | 3.13 | truck0 | 5329992#5_0 | 67.11 | 7.20 | !truck0!var#1 | 0.00 | truck_truck | 0.0 | 18275.04 | 26987.78 |
| 1 | 0.0 | 0.00 | 2416.04 | 0.01 | 0.72 | 0.01 | 42.25 | HBEFA3/PC_G_EU4 | 0.0 | 1.04 | veh0 | 5330181#0_0 | 65.15 | 5.10 | !veh0!var#1 | 14.72 | veh_passenger | 0.0 | 18279.94 | 24533.12 |
| 2 | 1.0 | 17.92 | 9898.93 | 0.00 | 103.38 | 2.49 | 50.28 | HBEFA3/HDV | 0.0 | 4.20 | truck0 | 5329992#5_0 | 73.20 | 8.21 | !truck0!var#1 | 1.01 | truck_truck | 0.0 | 18275.82 | 26988.43 |
| 3 | 1.0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 42.25 | HBEFA3/PC_G_EU4 | 0.0 | 0.00 | veh0 | 5330181#0_0 | 62.72 | 18.85 | !veh0!var#1 | 13.75 | veh_passenger | 0.0 | 18289.19 | 24543.30 |
| 4 | 1.0 | 164.78 | 2624.72 | 0.81 | 1.20 | 0.07 | 357.00 | HBEFA3/PC_G_EU4 | 0.0 | 1.13 | veh1 | -5338968#2_0 | 55.94 | 5.10 | !veh1!var#1 | 0.00 | veh_passenger | 0.0 | 29252.01 | 24424.16 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 95 | 7.0 | 23.44 | 2578.06 | 0.15 | 0.64 | 0.05 | 0.13 | HBEFA3/LDV_G_EU6 | 0.0 | 1.11 | moto2 | -5341858#10_0 | 63.24 | 35.78 | !moto2!var#1 | 11.62 | moto_motorcycle | 0.0 | 26468.26 | 25548.47 |
| 96 | 7.0 | 732.32 | 18759.70 | 3.34 | 3.79 | 1.19 | 179.93 | HBEFA3/LDV_G_EU6 | 0.0 | 8.07 | moto3 | -342586098#36_0 | 81.67 | 30.96 | !moto3!var#1 | 13.99 | moto_motorcycle | 0.0 | 24729.15 | 27450.68 |
| 97 | 7.0 | 294.68 | 6949.38 | 1.29 | 1.47 | 0.43 | 179.93 | HBEFA3/LDV_G_EU6 | 0.0 | 2.99 | moto4 | 5331636#0_0 | 72.45 | 11.88 | !moto4!var#1 | 6.37 | moto_motorcycle | 0.0 | 29159.96 | 25066.29 |
| 98 | 7.0 | 236.07 | 4292.19 | 0.97 | 0.93 | 0.30 | 1.91 | HBEFA3/LDV_G_EU6 | 0.0 | 1.85 | moto5 | 5340657#0_0 | 71.73 | 5.60 | !moto5!var#1 | 3.30 | moto_motorcycle | 0.0 | 24340.58 | 28198.87 |
| 99 | 7.0 | 179.19 | 1228.61 | 0.64 | 0.31 | 0.17 | 180.06 | HBEFA3/LDV_G_EU6 | 0.0 | 0.53 | moto6 | 5339596#0_0 | 55.94 | 2.30 | !moto6!var#1 | 0.00 | moto_motorcycle | 0.0 | 26577.70 | 25847.92 |
100 rows × 20 columns
| timestep_time | vehicle_CO | vehicle_CO2 | vehicle_HC | vehicle_NOx | vehicle_PMx | vehicle_angle | vehicle_electricity | vehicle_fuel | vehicle_noise | vehicle_pos | vehicle_speed | vehicle_waiting | vehicle_x | vehicle_y | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 1.633101e+07 | 1.633101e+07 | 1.633101e+07 | 1.633101e+07 | 1.633101e+07 | 1.633101e+07 | 1.633101e+07 | 16331007.0 | 1.633101e+07 | 1.633101e+07 | 1.633101e+07 | 1.633101e+07 | 1.633101e+07 | 1.633101e+07 | 1.633101e+07 |
| mean | 4.112561e+03 | 5.764304e+01 | 4.919050e+03 | 7.284125e-01 | 1.769589e+01 | 4.227491e-01 | 1.633698e+02 | 0.0 | 2.105266e+00 | 6.636207e+01 | 2.162082e+02 | 1.331140e+01 | 3.385107e+00 | 2.458506e+04 | 2.496505e+04 |
| std | 2.168986e+03 | 8.854365e+01 | 7.959043e+03 | 1.589816e+00 | 5.993168e+01 | 1.164065e+00 | 1.051232e+02 | 0.0 | 3.389028e+00 | 7.389330e+00 | 6.034189e+02 | 8.833069e+00 | 1.914152e+01 | 4.016049e+03 | 3.045771e+03 |
| min | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.0 | 0.000000e+00 | 1.258000e+01 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 9.960000e+00 | -1.490000e+00 |
| 25% | 2.291000e+03 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 9.031000e+01 | 0.0 | 0.000000e+00 | 6.249000e+01 | 2.383000e+01 | 6.550000e+00 | 0.000000e+00 | 2.219207e+04 | 2.349907e+04 |
| 50% | 4.133000e+03 | 2.017000e+01 | 2.624720e+03 | 1.500000e-01 | 1.200000e+00 | 6.000000e-02 | 1.799600e+02 | 0.0 | 1.130000e+00 | 6.711000e+01 | 7.199000e+01 | 1.337000e+01 | 0.000000e+00 | 2.393805e+04 | 2.548033e+04 |
| 75% | 5.903000e+03 | 1.034400e+02 | 6.161010e+03 | 7.600000e-01 | 2.710000e+00 | 1.500000e-01 | 2.703500e+02 | 0.0 | 2.650000e+00 | 7.112000e+01 | 1.780600e+02 | 1.999000e+01 | 0.000000e+00 | 2.691704e+04 | 2.672322e+04 |
| max | 1.441800e+04 | 3.932950e+03 | 1.153026e+05 | 1.729000e+01 | 8.864200e+02 | 1.432000e+01 | 3.600000e+02 | 0.0 | 4.888000e+01 | 1.019600e+02 | 1.943554e+04 | 5.013000e+01 | 3.970000e+02 | 4.492832e+04 | 4.753314e+04 |
Below we use sns.pairplot() to show you the 2D plots between datasets. We only use 0.5% of the randomly extracted data from emission_train to make plots becuase using too many data might crash the program. .sample(frac=0.01) takes a fraction of sample from DataFrame randomly.
del frees up memory for Python. However, it won't release memory back to the computer.From the pair plots you can visualize the relationships between the data in the dataset. For example, vehicle_CO2 and vehicle_fuel have a linear relationship. vehicle_CO2 and vehicle_pos have a parabolic or exponential like relationship. Some data might have a relationship that is not easily identified from pair plots.
$^{D1}$Q: What do you find from the Pairplot? Find three pairs of data and list what you observe from their pair plots.
Type your questions to Q:
correlation_graph_data = emission_train.sample(frac=0.05).reset_index(drop=True)
print(len(emission_train), 'emission_train')
print(len(correlation_graph_data), 'correlation_graph_data')
sns.pairplot(correlation_graph_data[['vehicle_CO2', 'vehicle_angle', 'vehicle_fuel', 'vehicle_noise', 'vehicle_pos', 'vehicle_speed', 'vehicle_waiting', 'vehicle_x', 'vehicle_y']], diag_kind='kde')
#Free up memory for Python
del correlation_graph_data
16331008 emission_train 816550 correlation_graph_data
Note that there are emission data like vehicle_CO, vehicle_CO2, vehicle_HC, vehicle_NOx, vehicle_PMx in the dataset. In this lab, we only want to look at vehicle_CO2.
After looking at the data, you might notice there are a lot of data we don't want for our machine learning. For example, all the vehicle_electricity are zeros, and vehicle_route data are only used to keep track of the unique route each vehicle goes through.
Below, unwanted data are dropped. vehicle_id data are dropped because they are only used to keep track of different vehicles. vehicle_lane data are the name of the road. We dropped vehicle_lane data becuase we believed the data might not affect vehicle emissions. In practice, you should only drop the data if you have clear reasonings. For example vehicle_electricity are all zeros, so you can drop them. Even if you do not drop them, the machine learning program might be able to figure the relationship out. vehicle_route data are dropped due to the reasoning above. timestep_time data are dropped becuase they are the simulation time.
To-do:
Type your questions to Q:
emission_train = emission_train.drop(columns=["vehicle_CO", "vehicle_HC", "vehicle_NOx", "vehicle_PMx",
"timestep_time", "vehicle_id", "vehicle_lane", "vehicle_electricity",
"vehicle_x", "vehicle_y"])
We seperated the block above from the block below becuase we don't want you to run pd.read_csv and emission_train.drop() twice. Reading a large csv file as you might have experienced a few minutes ago take up quite some RAM and CPU, and running .drop() twice will cause an error message to be printed out.
To-do:
emission_train data. It is okay if the displayed rows are truncated in the middle. display(emission_train.head(100))
display(emission_train.describe())
### Insert your code below ###
display(emission_train.tail(100))
| vehicle_CO2 | vehicle_angle | vehicle_eclass | vehicle_fuel | vehicle_noise | vehicle_pos | vehicle_route | vehicle_speed | vehicle_type | vehicle_waiting | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 7380.56 | 50.28 | HBEFA3/HDV | 3.13 | 67.11 | 7.20 | !truck0!var#1 | 0.00 | truck_truck | 0.0 |
| 1 | 2416.04 | 42.25 | HBEFA3/PC_G_EU4 | 1.04 | 65.15 | 5.10 | !veh0!var#1 | 14.72 | veh_passenger | 0.0 |
| 2 | 9898.93 | 50.28 | HBEFA3/HDV | 4.20 | 73.20 | 8.21 | !truck0!var#1 | 1.01 | truck_truck | 0.0 |
| 3 | 0.00 | 42.25 | HBEFA3/PC_G_EU4 | 0.00 | 62.72 | 18.85 | !veh0!var#1 | 13.75 | veh_passenger | 0.0 |
| 4 | 2624.72 | 357.00 | HBEFA3/PC_G_EU4 | 1.13 | 55.94 | 5.10 | !veh1!var#1 | 0.00 | veh_passenger | 0.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 95 | 2578.06 | 0.13 | HBEFA3/LDV_G_EU6 | 1.11 | 63.24 | 35.78 | !moto2!var#1 | 11.62 | moto_motorcycle | 0.0 |
| 96 | 18759.70 | 179.93 | HBEFA3/LDV_G_EU6 | 8.07 | 81.67 | 30.96 | !moto3!var#1 | 13.99 | moto_motorcycle | 0.0 |
| 97 | 6949.38 | 179.93 | HBEFA3/LDV_G_EU6 | 2.99 | 72.45 | 11.88 | !moto4!var#1 | 6.37 | moto_motorcycle | 0.0 |
| 98 | 4292.19 | 1.91 | HBEFA3/LDV_G_EU6 | 1.85 | 71.73 | 5.60 | !moto5!var#1 | 3.30 | moto_motorcycle | 0.0 |
| 99 | 1228.61 | 180.06 | HBEFA3/LDV_G_EU6 | 0.53 | 55.94 | 2.30 | !moto6!var#1 | 0.00 | moto_motorcycle | 0.0 |
100 rows × 10 columns
| vehicle_CO2 | vehicle_angle | vehicle_fuel | vehicle_noise | vehicle_pos | vehicle_speed | vehicle_waiting | |
|---|---|---|---|---|---|---|---|
| count | 1.633101e+07 | 1.633101e+07 | 1.633101e+07 | 1.633101e+07 | 1.633101e+07 | 1.633101e+07 | 1.633101e+07 |
| mean | 4.919050e+03 | 1.633698e+02 | 2.105266e+00 | 6.636207e+01 | 2.162082e+02 | 1.331140e+01 | 3.385107e+00 |
| std | 7.959043e+03 | 1.051232e+02 | 3.389028e+00 | 7.389330e+00 | 6.034189e+02 | 8.833069e+00 | 1.914152e+01 |
| min | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.258000e+01 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 |
| 25% | 0.000000e+00 | 9.031000e+01 | 0.000000e+00 | 6.249000e+01 | 2.383000e+01 | 6.550000e+00 | 0.000000e+00 |
| 50% | 2.624720e+03 | 1.799600e+02 | 1.130000e+00 | 6.711000e+01 | 7.199000e+01 | 1.337000e+01 | 0.000000e+00 |
| 75% | 6.161010e+03 | 2.703500e+02 | 2.650000e+00 | 7.112000e+01 | 1.780600e+02 | 1.999000e+01 | 0.000000e+00 |
| max | 1.153026e+05 | 3.600000e+02 | 4.888000e+01 | 1.019600e+02 | 1.943554e+04 | 5.013000e+01 | 3.970000e+02 |
| vehicle_CO2 | vehicle_angle | vehicle_eclass | vehicle_fuel | vehicle_noise | vehicle_pos | vehicle_route | vehicle_speed | vehicle_type | vehicle_waiting | |
|---|---|---|---|---|---|---|---|---|---|---|
| 16330908 | 5293.91 | 1.98 | HBEFA3/Bus | 2.26 | 67.19 | 77.83 | pt_bus_5E:0 | 0.01 | pt_bus | 1.0 |
| 16330909 | 6541.73 | 2.07 | HBEFA3/Bus | 2.79 | 71.21 | 0.69 | pt_bus_5E:0 | 0.70 | pt_bus | 0.0 |
| 16330910 | 10387.44 | 2.06 | HBEFA3/Bus | 4.43 | 74.53 | 2.58 | pt_bus_5E:0 | 1.88 | pt_bus | 0.0 |
| 16330911 | 12058.39 | 1.62 | HBEFA3/Bus | 5.14 | 73.88 | 5.45 | pt_bus_5E:0 | 2.87 | pt_bus | 0.0 |
| 16330912 | 13307.66 | 1.06 | HBEFA3/Bus | 5.67 | 73.64 | 9.19 | pt_bus_5E:0 | 3.74 | pt_bus | 0.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 16331003 | 19817.16 | 0.45 | HBEFA3/Bus | 8.45 | 76.56 | 185.84 | pt_bus_5E:0 | 13.65 | pt_bus | 0.0 |
| 16331004 | 0.00 | 0.45 | HBEFA3/Bus | 0.00 | 74.14 | 199.17 | pt_bus_5E:0 | 13.33 | pt_bus | 0.0 |
| 16331005 | 23192.37 | 0.45 | HBEFA3/Bus | 9.89 | 77.18 | 212.90 | pt_bus_5E:0 | 13.73 | pt_bus | 0.0 |
| 16331006 | 0.00 | 0.45 | HBEFA3/Bus | 0.00 | 74.10 | 226.29 | pt_bus_5E:0 | 13.39 | pt_bus | 0.0 |
| 16331007 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
100 rows × 10 columns
By now, you would have already done some cleanups by dropping unwanted data. Below we used a for loop to cast the data in vehicle_eclass and vehicle_type to string. As you might notice that the values in both columns are texts. However, we found that the data in our csv file cannot be read correctly into Tensorflow so we added the for loop.
.dropna().reset_index(drop=True) drops the rows that contain NaN in any columns and reset the row index.To-do:
emission_train and save a new copy to emission_train_shuffle. Hint: Look at the function we used to extract data for the correlation graph.Type your answers to Q:
for header in ["vehicle_eclass", "vehicle_type"]:
emission_train[header] = emission_train[header].astype(str)
emission_train = emission_train.dropna().reset_index(drop=True)
# Shuffle the dataset
emission_train_shuffle = emission_train.sample(frac=1)
### Insert your code below ###
# Display the data pre- and post- shuffle
display(emission_train.head(100))
###FILL IN THE CODE
display(emission_train_shuffle.head(100))
# Get info of the dataframe
###FILL IN THE CODE
display(emission_train.describe())
display(emission_train_shuffle.describe())
| vehicle_CO2 | vehicle_angle | vehicle_eclass | vehicle_fuel | vehicle_noise | vehicle_pos | vehicle_route | vehicle_speed | vehicle_type | vehicle_waiting | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 7380.56 | 50.28 | HBEFA3/HDV | 3.13 | 67.11 | 7.20 | !truck0!var#1 | 0.00 | truck_truck | 0.0 |
| 1 | 2416.04 | 42.25 | HBEFA3/PC_G_EU4 | 1.04 | 65.15 | 5.10 | !veh0!var#1 | 14.72 | veh_passenger | 0.0 |
| 2 | 9898.93 | 50.28 | HBEFA3/HDV | 4.20 | 73.20 | 8.21 | !truck0!var#1 | 1.01 | truck_truck | 0.0 |
| 3 | 0.00 | 42.25 | HBEFA3/PC_G_EU4 | 0.00 | 62.72 | 18.85 | !veh0!var#1 | 13.75 | veh_passenger | 0.0 |
| 4 | 2624.72 | 357.00 | HBEFA3/PC_G_EU4 | 1.13 | 55.94 | 5.10 | !veh1!var#1 | 0.00 | veh_passenger | 0.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 95 | 2578.06 | 0.13 | HBEFA3/LDV_G_EU6 | 1.11 | 63.24 | 35.78 | !moto2!var#1 | 11.62 | moto_motorcycle | 0.0 |
| 96 | 18759.70 | 179.93 | HBEFA3/LDV_G_EU6 | 8.07 | 81.67 | 30.96 | !moto3!var#1 | 13.99 | moto_motorcycle | 0.0 |
| 97 | 6949.38 | 179.93 | HBEFA3/LDV_G_EU6 | 2.99 | 72.45 | 11.88 | !moto4!var#1 | 6.37 | moto_motorcycle | 0.0 |
| 98 | 4292.19 | 1.91 | HBEFA3/LDV_G_EU6 | 1.85 | 71.73 | 5.60 | !moto5!var#1 | 3.30 | moto_motorcycle | 0.0 |
| 99 | 1228.61 | 180.06 | HBEFA3/LDV_G_EU6 | 0.53 | 55.94 | 2.30 | !moto6!var#1 | 0.00 | moto_motorcycle | 0.0 |
100 rows × 10 columns
| vehicle_CO2 | vehicle_angle | vehicle_eclass | vehicle_fuel | vehicle_noise | vehicle_pos | vehicle_route | vehicle_speed | vehicle_type | vehicle_waiting | |
|---|---|---|---|---|---|---|---|---|---|---|
| 2834601 | 0.00 | 180.43 | HBEFA3/PC_G_EU4 | 0.00 | 47.71 | 72.82 | !veh4685!var#1 | 4.46 | veh_passenger | 0.0 |
| 12665489 | 5574.18 | 270.37 | HBEFA3/PC_G_EU4 | 2.40 | 66.53 | 18.30 | !veh15679!var#1 | 8.84 | veh_passenger | 0.0 |
| 3755595 | 26332.65 | 180.32 | HBEFA3/HDV | 11.16 | 78.19 | 12.45 | !truck352!var#1 | 15.59 | truck_truck | 0.0 |
| 1695537 | 6544.37 | 90.42 | HBEFA3/PC_G_EU4 | 2.81 | 68.26 | 42.76 | !veh3276!var#1 | 15.21 | veh_passenger | 0.0 |
| 3896343 | 9367.20 | 0.12 | HBEFA3/PC_G_EU4 | 4.03 | 72.51 | 141.08 | !veh4654!var#1 | 23.68 | veh_passenger | 0.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1472395 | 0.00 | 180.19 | HBEFA3/PC_G_EU4 | 0.00 | 55.04 | 116.59 | !veh2374!var#1 | 6.11 | veh_passenger | 0.0 |
| 3685158 | 0.00 | 271.47 | HBEFA3/PC_G_EU4 | 0.00 | 61.63 | 53.35 | !veh5458!var#1 | 13.93 | veh_passenger | 0.0 |
| 14020916 | 0.00 | 179.93 | HBEFA3/PC_G_EU4 | 0.00 | 55.94 | 92.25 | !veh17020!var#1 | 0.00 | veh_passenger | 4.0 |
| 4428417 | 0.00 | 270.24 | HBEFA3/PC_G_EU4 | 0.00 | 72.54 | 760.05 | !veh5819!var#1 | 27.84 | veh_passenger | 0.0 |
| 4388932 | 0.00 | 18.44 | HBEFA3/PC_G_EU4 | 0.00 | 66.96 | 9.30 | !veh5855!var#1 | 18.56 | veh_passenger | 0.0 |
100 rows × 10 columns
| vehicle_CO2 | vehicle_angle | vehicle_fuel | vehicle_noise | vehicle_pos | vehicle_speed | vehicle_waiting | |
|---|---|---|---|---|---|---|---|
| count | 1.633101e+07 | 1.633101e+07 | 1.633101e+07 | 1.633101e+07 | 1.633101e+07 | 1.633101e+07 | 1.633101e+07 |
| mean | 4.919050e+03 | 1.633698e+02 | 2.105266e+00 | 6.636207e+01 | 2.162082e+02 | 1.331140e+01 | 3.385107e+00 |
| std | 7.959043e+03 | 1.051232e+02 | 3.389028e+00 | 7.389330e+00 | 6.034189e+02 | 8.833069e+00 | 1.914152e+01 |
| min | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.258000e+01 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 |
| 25% | 0.000000e+00 | 9.031000e+01 | 0.000000e+00 | 6.249000e+01 | 2.383000e+01 | 6.550000e+00 | 0.000000e+00 |
| 50% | 2.624720e+03 | 1.799600e+02 | 1.130000e+00 | 6.711000e+01 | 7.199000e+01 | 1.337000e+01 | 0.000000e+00 |
| 75% | 6.161010e+03 | 2.703500e+02 | 2.650000e+00 | 7.112000e+01 | 1.780600e+02 | 1.999000e+01 | 0.000000e+00 |
| max | 1.153026e+05 | 3.600000e+02 | 4.888000e+01 | 1.019600e+02 | 1.943554e+04 | 5.013000e+01 | 3.970000e+02 |
| vehicle_CO2 | vehicle_angle | vehicle_fuel | vehicle_noise | vehicle_pos | vehicle_speed | vehicle_waiting | |
|---|---|---|---|---|---|---|---|
| count | 1.633101e+07 | 1.633101e+07 | 1.633101e+07 | 1.633101e+07 | 1.633101e+07 | 1.633101e+07 | 1.633101e+07 |
| mean | 4.919050e+03 | 1.633698e+02 | 2.105266e+00 | 6.636207e+01 | 2.162082e+02 | 1.331140e+01 | 3.385107e+00 |
| std | 7.959043e+03 | 1.051232e+02 | 3.389028e+00 | 7.389330e+00 | 6.034189e+02 | 8.833069e+00 | 1.914152e+01 |
| min | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.258000e+01 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 |
| 25% | 0.000000e+00 | 9.031000e+01 | 0.000000e+00 | 6.249000e+01 | 2.383000e+01 | 6.550000e+00 | 0.000000e+00 |
| 50% | 2.624720e+03 | 1.799600e+02 | 1.130000e+00 | 6.711000e+01 | 7.199000e+01 | 1.337000e+01 | 0.000000e+00 |
| 75% | 6.161010e+03 | 2.703500e+02 | 2.650000e+00 | 7.112000e+01 | 1.780600e+02 | 1.999000e+01 | 0.000000e+00 |
| max | 1.153026e+05 | 3.600000e+02 | 4.888000e+01 | 1.019600e+02 | 1.943554e+04 | 5.013000e+01 | 3.970000e+02 |
Before you proceed, make sure you finish reading "Machine Learning Introduction" in Step 3 of the lab. You should complete the Tensorflow playground exercise and take a screenshot of your results.
In machine learning, we often want to split our data into Training Set, Validation Set, and Test Set.
A typical workflow will be:
To-Do:
test_size=0.99 in the first split. test_size= values for spilitting train_df, test_df, and val_df.random.seed(20211120)
tf.random.set_seed(20211120)
np.random.seed(20211120)
train_df, backup_df = train_test_split(emission_train_shuffle, test_size=0.99) # Comment this line for large data training
# Edit the test_size below.
# train_df, test_df = train_test_split(emission_train_shuffle, test_size=0.1) # Uncomment for large dataset
train_df, test_df = train_test_split(train_df, test_size=0.1) # Comment for large dataset
train_df, val_df = train_test_split(train_df, test_size=0.1)
print(len(backup_df), 'backup data')
print(len(train_df), 'train examples')
print(len(val_df), 'validation examples')
print(len(test_df), 'test examples')
#del emission_train
16167697 backup data 132281 train examples 14698 validation examples 16331 test examples
Sometimes when there are huge value differences between input features, we want to scale them to get a better training result. In this lab you are not required to use normalization. But if you cannot get a nice machine learning result, you can try normalizing the data. Below, we used Z normalization. It is just a normalization method. If you normalize your trainning data, make sure to also normalize the validation and test data. Note that train_df_norm = train_df won't copy train_df to train_df_norm. Changing the values in train_df_norm will affect the values in train_df. So if you decide to revert the normalization after you run the code block below, run the code block under "Split Data for Machine Learning" again and run only the train_df_norm = train_df below. (Comment out the code using # sign.)
Z Normalization Equation: \begin{equation*} z = \frac{x - \mu}{\sigma} \\ z: \text{Normalized Data} \\ x: \text{Original Data} \\ \mu: \text{Mean of }x \\ \sigma: \text{Standard Deviation of }x \\ \end{equation*}
# # Z-Score Normalizing
train_df_norm = train_df
val_df_norm = val_df
test_def_norm = test_df
for header in ["vehicle_noise", "vehicle_speed", "vehicle_waiting"]:
mean = train_df[header].mean()
std = train_df[header].std()
train_df_norm[header] = (train_df[header] - mean) / std
train_df_norm[header] = train_df_norm[header].fillna(0)
### Insert your code below (optional) ###
# Normalize the validation data
val_df_norm[header] = (val_df[header] - mean) / std
val_df_norm[header] = val_df_norm[header].fillna(0)
# Normalize the test data
test_def_norm[header] = (test_def_norm[header] - mean) / std
test_def_norm[header] = test_def_norm[header].fillna(0)
print(train_df_norm.head())
vehicle_CO2 vehicle_angle vehicle_eclass vehicle_fuel \
7638310 2549.70 203.05 HBEFA3/PC_G_EU4 1.10
11969572 2624.72 181.45 HBEFA3/PC_G_EU4 1.13
16068406 5286.11 269.35 HBEFA3/Bus 2.25
2078171 4237.56 4.17 HBEFA3/PC_G_EU4 1.82
8433552 0.00 55.50 HBEFA3/PC_G_EU4 0.00
vehicle_noise vehicle_pos vehicle_route vehicle_speed \
7638310 -1.075224 2.67 !veh11122!var#1 -1.201680
11969572 -1.411014 77.33 !veh16062!var#1 -1.507755
16068406 0.101393 52.65 pt_bus_9B:0 -1.507755
2078171 0.045879 227.06 !veh2109!var#1 0.251605
8433552 -2.934252 7.53 !veh10684!var#1 -1.141599
vehicle_type vehicle_waiting
7638310 veh_passenger -0.178627
11969572 veh_passenger 0.336253
16068406 pt_bus 1.520476
2078171 veh_passenger -0.178627
8433552 veh_passenger -0.178627
We need to define our feature columns so that the program knows what type of features are used in the training. In emission data, there are two types of features: numeric (floating point, int, etc.) and categorical/indicator (for example, 'color', 'gender'; 'color' column can contain 'red', 'blue', etc.).
To Do:
train_df
| vehicle_CO2 | vehicle_angle | vehicle_eclass | vehicle_fuel | vehicle_noise | vehicle_pos | vehicle_route | vehicle_speed | vehicle_type | vehicle_waiting | |
|---|---|---|---|---|---|---|---|---|---|---|
| 7638310 | 2549.70 | 203.05 | HBEFA3/PC_G_EU4 | 1.10 | -1.075224 | 2.67 | !veh11122!var#1 | -1.201680 | veh_passenger | -0.178627 |
| 11969572 | 2624.72 | 181.45 | HBEFA3/PC_G_EU4 | 1.13 | -1.411014 | 77.33 | !veh16062!var#1 | -1.507755 | veh_passenger | 0.336253 |
| 16068406 | 5286.11 | 269.35 | HBEFA3/Bus | 2.25 | 0.101393 | 52.65 | pt_bus_9B:0 | -1.507755 | pt_bus | 1.520476 |
| 2078171 | 4237.56 | 4.17 | HBEFA3/PC_G_EU4 | 1.82 | 0.045879 | 227.06 | !veh2109!var#1 | 0.251605 | veh_passenger | -0.178627 |
| 8433552 | 0.00 | 55.50 | HBEFA3/PC_G_EU4 | 0.00 | -2.934252 | 7.53 | !veh10684!var#1 | -1.141599 | veh_passenger | -0.178627 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 140030 | 4581.15 | 181.14 | HBEFA3/PC_G_EU4 | 1.97 | -0.187007 | 7.49 | !veh724!var#1 | -0.718763 | veh_passenger | -0.178627 |
| 2048360 | 0.00 | 1.27 | HBEFA3/PC_G_EU4 | 0.00 | -1.412368 | 280.25 | !veh2561!var#1 | -1.507755 | veh_passenger | -0.024163 |
| 7157891 | 0.00 | 90.80 | HBEFA3/PC_G_EU4 | 0.00 | -0.345424 | 181.85 | !veh10553!var#1 | -0.055603 | veh_passenger | -0.178627 |
| 13558792 | 3474.49 | 94.34 | HBEFA3/PC_G_EU4 | 1.49 | -0.092228 | 7.14 | !veh18582!var#1 | -1.276499 | veh_passenger | -0.178627 |
| 8306460 | 3394.11 | 271.91 | HBEFA3/PC_G_EU4 | 1.46 | 0.311261 | 22.24 | !veh10766!var#1 | 0.765130 | veh_passenger | -0.178627 |
132281 rows × 10 columns
# Create an empty list
feature_cols = []
# Numeric Columns
for header in ["vehicle_fuel", "vehicle_speed", "vehicle_angle", "vehicle_noise"]: ### Finish the list on the left
col = tf.feature_column.numeric_column(header)
feature_cols.append(col)
### Insert your code ###
# Indicator Columns
indicator_col_names = ["vehicle_eclass", "vehicle_type"]
for col_name in indicator_col_names:
categorical_column = tf.feature_column.categorical_column_with_vocabulary_list(col_name,
train_df[col_name].unique())
indicator_column = tf.feature_column.indicator_column(categorical_column)
feature_cols.append(indicator_column)
print("Feature columns: ", feature_cols, "\n")
Feature columns: [NumericColumn(key='vehicle_fuel', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='vehicle_speed', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='vehicle_angle', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='vehicle_noise', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(key='vehicle_eclass', vocabulary_list=('HBEFA3/PC_G_EU4', 'HBEFA3/Bus', 'HBEFA3/HDV', 'HBEFA3/LDV_G_EU6'), dtype=tf.string, default_value=-1, num_oov_buckets=0)), IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(key='vehicle_type', vocabulary_list=('veh_passenger', 'pt_bus', 'truck_truck', 'moto_motorcycle', 'bus_bus'), dtype=tf.string, default_value=-1, num_oov_buckets=0))]
Feature layer will the input to our machine learning. We need to create a feature layer to be added into the machine learning model.
# Create a feature layer for tf
feature_layer = tf.keras.layers.DenseFeatures(feature_cols, name='Features')
model.add(): add layer to model
In tf.keras.layers.Dense()
units: number of nodes in that layer
activation: activation function used in that layer
kernel_regularizer: regularization function used in that layer
name: is just for us to keep track and debug
In model.compile()
optimizer=tf.keras.optimizers.Adam(lr=learning_rate): Used to improve performance of the training
Adam: stochastic gradient descent method
loss: update the model according to specified loss function
metrics: evaluate the model according specified metrics
We first split our Pandas dataframe into features and labels.
Then model.fit() trains our model.
logdir, tensorboard_callback is to save training logs to be used in Tensorboard.
Notice that there are 2 model.fit() function calls with one being commented out. The one without callbacks=[tensorboard_callback] is used in this program for large dataset training.
As we mentioned in the lab document, hyperparameters affect the performance of your model. In the following blocks, you would be training your model. We also want you to experience training both a small dataset and a large dataset.
To-do:
Small Dataset:
The program cells you ran until now prepare you for small dataset training. You don't need to adjust the test_size=0.99 in "Split Data for Machine Learning".
Adjust the Hyperparameters (learning rate, batch size, epochs, hidden layer number, node number). Add in additional hidden layers as needed. Remember, a large learning rate might cause the model to never converge, but a very small learning rate would cause the model to converge very slow. If your mse (mean squared error) is decreasing but your program finishes before the mse reaches a small number, increase your epochs. Lastly, start with a small batch size. Smaller batch size often gives a better training result. A large batch size often causes poor convergence, and it might also lead to poor generalization and slow training speed. Try batch sizes of 100, 500, 1000.
In the function definitions (previous code block):
$^{M2}$Once you get a result with nice mse, run the block %tensorboard --logdir logs. Then take screenshots that show your epoch_loss and your epoch_mse.
Large Dataset:
Adjust the codes in "Split Data for Machine Learning" so that no data go to backup_df.
Go to previous code block and use the model.fit() without callbacks=[tensorboard_callback]. Remember to comment out the one with callbacks=[tensorboard_callback].
Adjust the Hyperparameters (learning rate, batch size, epochs, hidden layer number, node number). Remember, a large learning rate might cause the model to never converge, but a very small learning rate would cause the model to converge very slow. If your mse (mean squared error) is decreasing but your program finishes before the mse reaches a small number, increase your epochs. Smaller batch size often gives a better training result. A large batch size often causes poor convergence, and it might also lead to poor generalization and slow training speed. Try batch sizes of 1000, 10000, 200000. $^{M3}$Q: Do you notice any difference between using batch sizes of 1000, 10000, 200000?
In the function definitions:
$^{M4}$The program will run for a longer time with large dataset input. Once you get a result with nice mse, you don't have to run %tensorboard --logdir logs. Move on to sections below. We would have you save a PDF once you reach the end of this Notebook. We will look at your training for the large dataset based on the logs printed out during each epoch.
Note: Ignore the warnings at the beginning and at the end.
Type your answers to Q:
train_lbl = np.array(train_df["vehicle_CO2"])
train_df = train_df.drop(columns=["vehicle_CO2"])
# Split the datasets into features and label.
train_ft = {name:np.array(value) for name, value in train_df.items()}
# train_lbl = np.array(train_ft.pop(label_name))
val_lbl = np.array(val_df["vehicle_CO2"])
val_df = val_df.drop(columns=["vehicle_CO2"])
val_ft = {name:np.array(value) for name, value in val_df.items()}
test_lbl = np.array(test_df["vehicle_CO2"])
test_df = test_df.drop(columns=["vehicle_CO2"])
test_ft = {key:np.array(value) for key, value in test_df.items()}
# Hyperparameters
learning_rate = 0.00025
epochs = 100
batch_size = 200
# Label
label_name = "vehicle_CO2"
shuffle = True
#---Create a sequential model---#
model = tf.keras.models.Sequential([
# Add the feature layer
feature_layer,
# First hidden layer with 20 nodes
tf.keras.layers.Dense(units=50,
activation='relu',
kernel_regularizer=tf.keras.regularizers.l1(l=0.01),
name='Hidden1'),
# Second hidden layer with 10 nodes
tf.keras.layers.Dense(units=20,
activation='relu',
kernel_regularizer=tf.keras.regularizers.l1(l=0.01),
name='Hidden2'),
tf.keras.layers.Dense(units=10,
activation='relu',
kernel_regularizer=tf.keras.regularizers.l1(l=0.01),
name='Hidden3'),
# Output layer
tf.keras.layers.Dense(units=1,
activation='softplus',
name='Output')
])
model.compile(optimizer=tf.keras.optimizers.Adam(lr=learning_rate),
loss=tf.keras.losses.MeanSquaredError(),
metrics=['mse'])
#---Train the Model---#
# Keras TensorBoard callback.
logdir = "logs/fit/" + datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=logdir)
# Keras TensorBoard callback.
logdir = "logs/fit/" + datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=logdir)
model.fit(x=train_ft, y=train_lbl, batch_size=batch_size,
epochs=epochs, callbacks=[tensorboard_callback], validation_data=(val_ft, val_lbl), shuffle=shuffle)
# Training function for large training set
# model.fit(x=train_ft, y=train_lbl, batch_size=batch_size,
# epochs=epochs, verbose=2, validation_data=(val_ft, val_lbl), shuffle=shuffle)
2021-11-20 21:48:37.862127: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2021-11-20 21:48:37.862196: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303) 2021-11-20 21:48:37.862222: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (DESKTOP-MCGAV6B): /proc/driver/nvidia/version does not exist 2021-11-20 21:48:37.862496: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. /home/nick/.virtualenvs/iotlab4/lib/python3.8/site-packages/keras/optimizer_v2/adam.py:105: UserWarning: The `lr` argument is deprecated, use `learning_rate` instead. super(Adam, self).__init__(name, **kwargs)
Epoch 1/100
WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'dict'> input: {'vehicle_angle': <tf.Tensor 'IteratorGetNext:0' shape=(None,) dtype=float32>, 'vehicle_eclass': <tf.Tensor 'IteratorGetNext:1' shape=(None,) dtype=string>, 'vehicle_fuel': <tf.Tensor 'IteratorGetNext:2' shape=(None,) dtype=float32>, 'vehicle_noise': <tf.Tensor 'IteratorGetNext:3' shape=(None,) dtype=float32>, 'vehicle_pos': <tf.Tensor 'IteratorGetNext:4' shape=(None,) dtype=float32>, 'vehicle_route': <tf.Tensor 'IteratorGetNext:5' shape=(None,) dtype=string>, 'vehicle_speed': <tf.Tensor 'IteratorGetNext:6' shape=(None,) dtype=float32>, 'vehicle_type': <tf.Tensor 'IteratorGetNext:7' shape=(None,) dtype=string>, 'vehicle_waiting': <tf.Tensor 'IteratorGetNext:8' shape=(None,) dtype=float32>}
Consider rewriting this model with the Functional API.
WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'dict'> input: {'vehicle_angle': <tf.Tensor 'IteratorGetNext:0' shape=(None,) dtype=float32>, 'vehicle_eclass': <tf.Tensor 'IteratorGetNext:1' shape=(None,) dtype=string>, 'vehicle_fuel': <tf.Tensor 'IteratorGetNext:2' shape=(None,) dtype=float32>, 'vehicle_noise': <tf.Tensor 'IteratorGetNext:3' shape=(None,) dtype=float32>, 'vehicle_pos': <tf.Tensor 'IteratorGetNext:4' shape=(None,) dtype=float32>, 'vehicle_route': <tf.Tensor 'IteratorGetNext:5' shape=(None,) dtype=string>, 'vehicle_speed': <tf.Tensor 'IteratorGetNext:6' shape=(None,) dtype=float32>, 'vehicle_type': <tf.Tensor 'IteratorGetNext:7' shape=(None,) dtype=string>, 'vehicle_waiting': <tf.Tensor 'IteratorGetNext:8' shape=(None,) dtype=float32>}
Consider rewriting this model with the Functional API.
656/662 [============================>.] - ETA: 0s - loss: 79330544.0000 - mse: 79330544.0000WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'dict'> input: {'vehicle_angle': <tf.Tensor 'IteratorGetNext:0' shape=(None,) dtype=float32>, 'vehicle_eclass': <tf.Tensor 'IteratorGetNext:1' shape=(None,) dtype=string>, 'vehicle_fuel': <tf.Tensor 'IteratorGetNext:2' shape=(None,) dtype=float32>, 'vehicle_noise': <tf.Tensor 'IteratorGetNext:3' shape=(None,) dtype=float32>, 'vehicle_pos': <tf.Tensor 'IteratorGetNext:4' shape=(None,) dtype=float32>, 'vehicle_route': <tf.Tensor 'IteratorGetNext:5' shape=(None,) dtype=string>, 'vehicle_speed': <tf.Tensor 'IteratorGetNext:6' shape=(None,) dtype=float32>, 'vehicle_type': <tf.Tensor 'IteratorGetNext:7' shape=(None,) dtype=string>, 'vehicle_waiting': <tf.Tensor 'IteratorGetNext:8' shape=(None,) dtype=float32>}
Consider rewriting this model with the Functional API.
662/662 [==============================] - 3s 4ms/step - loss: 79184344.0000 - mse: 79184344.0000 - val_loss: 70727368.0000 - val_mse: 70727368.0000
Epoch 2/100
662/662 [==============================] - 2s 3ms/step - loss: 65018128.0000 - mse: 65018128.0000 - val_loss: 63399000.0000 - val_mse: 63398996.0000
Epoch 3/100
662/662 [==============================] - 2s 3ms/step - loss: 49964048.0000 - mse: 49964044.0000 - val_loss: 35209080.0000 - val_mse: 35209076.0000
Epoch 4/100
662/662 [==============================] - 2s 3ms/step - loss: 16127032.0000 - mse: 16127020.0000 - val_loss: 5357274.5000 - val_mse: 5357267.0000
Epoch 5/100
662/662 [==============================] - 2s 3ms/step - loss: 3090708.2500 - mse: 3090700.0000 - val_loss: 2276020.7500 - val_mse: 2276012.7500
Epoch 6/100
662/662 [==============================] - 2s 3ms/step - loss: 1692573.6250 - mse: 1692566.1250 - val_loss: 1372377.0000 - val_mse: 1372368.7500
Epoch 7/100
662/662 [==============================] - 2s 3ms/step - loss: 1007589.5625 - mse: 1007581.9375 - val_loss: 809112.5000 - val_mse: 809104.5000
Epoch 8/100
662/662 [==============================] - 2s 3ms/step - loss: 560044.6250 - mse: 560036.4375 - val_loss: 424189.8750 - val_mse: 424182.3750
Epoch 9/100
662/662 [==============================] - 2s 3ms/step - loss: 279927.9375 - mse: 279920.8125 - val_loss: 194143.5469 - val_mse: 194136.0000
Epoch 10/100
662/662 [==============================] - 2s 4ms/step - loss: 120130.2734 - mse: 120122.6172 - val_loss: 81550.8281 - val_mse: 81543.2969
Epoch 11/100
662/662 [==============================] - 2s 4ms/step - loss: 52648.6094 - mse: 52641.1211 - val_loss: 39710.3633 - val_mse: 39702.8281
Epoch 12/100
662/662 [==============================] - 2s 4ms/step - loss: 28584.6523 - mse: 28577.1328 - val_loss: 24565.3086 - val_mse: 24557.7852
Epoch 13/100
662/662 [==============================] - 2s 4ms/step - loss: 18964.7520 - mse: 18957.2441 - val_loss: 16506.0078 - val_mse: 16498.4941
Epoch 14/100
662/662 [==============================] - 2s 4ms/step - loss: 13263.4209 - mse: 13255.9150 - val_loss: 11562.5117 - val_mse: 11555.0176
Epoch 15/100
662/662 [==============================] - 2s 4ms/step - loss: 9290.7842 - mse: 9283.2930 - val_loss: 8341.3701 - val_mse: 8333.8809
Epoch 16/100
662/662 [==============================] - 3s 4ms/step - loss: 6590.5698 - mse: 6583.0913 - val_loss: 5607.2427 - val_mse: 5599.7637
Epoch 17/100
662/662 [==============================] - 2s 4ms/step - loss: 4637.5386 - mse: 4630.0630 - val_loss: 3949.8479 - val_mse: 3942.3826
Epoch 18/100
662/662 [==============================] - 2s 4ms/step - loss: 3266.8867 - mse: 3259.4277 - val_loss: 2739.9700 - val_mse: 2732.5144
Epoch 19/100
662/662 [==============================] - 2s 4ms/step - loss: 2243.1565 - mse: 2235.7070 - val_loss: 1812.6089 - val_mse: 1805.1610
Epoch 20/100
662/662 [==============================] - 2s 4ms/step - loss: 1562.9683 - mse: 1555.5232 - val_loss: 1342.1046 - val_mse: 1334.6573
Epoch 21/100
662/662 [==============================] - 2s 4ms/step - loss: 1130.0829 - mse: 1122.6382 - val_loss: 896.1839 - val_mse: 888.7385
Epoch 22/100
662/662 [==============================] - 2s 4ms/step - loss: 849.8592 - mse: 842.4114 - val_loss: 754.9175 - val_mse: 747.4648
Epoch 23/100
662/662 [==============================] - 2s 4ms/step - loss: 743.2767 - mse: 735.8193 - val_loss: 596.7343 - val_mse: 589.2687
Epoch 24/100
662/662 [==============================] - 3s 4ms/step - loss: 597.7314 - mse: 590.2531 - val_loss: 474.6888 - val_mse: 467.1915
Epoch 25/100
662/662 [==============================] - 2s 3ms/step - loss: 480.4693 - mse: 472.9565 - val_loss: 379.8245 - val_mse: 372.2978
Epoch 26/100
662/662 [==============================] - 2s 4ms/step - loss: 385.4793 - mse: 377.9421 - val_loss: 405.6169 - val_mse: 398.0688
Epoch 27/100
662/662 [==============================] - 2s 4ms/step - loss: 349.0853 - mse: 341.5297 - val_loss: 372.6129 - val_mse: 365.0504
Epoch 28/100
662/662 [==============================] - 2s 4ms/step - loss: 315.6259 - mse: 308.0577 - val_loss: 335.0668 - val_mse: 327.4929
Epoch 29/100
662/662 [==============================] - 2s 3ms/step - loss: 296.7340 - mse: 289.1563 - val_loss: 226.8197 - val_mse: 219.2377
Epoch 30/100
662/662 [==============================] - 2s 4ms/step - loss: 267.8750 - mse: 260.2901 - val_loss: 235.0442 - val_mse: 227.4561
Epoch 31/100
662/662 [==============================] - 2s 4ms/step - loss: 252.7023 - mse: 245.1116 - val_loss: 276.1638 - val_mse: 268.5700
Epoch 32/100
662/662 [==============================] - 2s 4ms/step - loss: 228.7534 - mse: 221.1579 - val_loss: 184.3275 - val_mse: 176.7292
Epoch 33/100
662/662 [==============================] - 2s 4ms/step - loss: 217.7415 - mse: 210.1419 - val_loss: 183.8622 - val_mse: 176.2604
Epoch 34/100
662/662 [==============================] - 2s 4ms/step - loss: 213.4106 - mse: 205.8080 - val_loss: 176.6708 - val_mse: 169.0678
Epoch 35/100
662/662 [==============================] - 2s 4ms/step - loss: 195.8635 - mse: 188.2589 - val_loss: 261.1991 - val_mse: 253.5918
Epoch 36/100
662/662 [==============================] - 2s 3ms/step - loss: 186.6559 - mse: 179.0488 - val_loss: 188.1329 - val_mse: 180.5242
Epoch 37/100
662/662 [==============================] - 2s 4ms/step - loss: 168.9338 - mse: 161.3241 - val_loss: 151.6991 - val_mse: 144.0888
Epoch 38/100
662/662 [==============================] - 2s 3ms/step - loss: 166.4013 - mse: 158.7903 - val_loss: 126.9212 - val_mse: 119.3103
Epoch 39/100
662/662 [==============================] - 2s 3ms/step - loss: 162.2242 - mse: 154.6139 - val_loss: 173.9848 - val_mse: 166.3758
Epoch 40/100
662/662 [==============================] - 2s 3ms/step - loss: 155.0030 - mse: 147.3941 - val_loss: 369.8058 - val_mse: 362.1980
Epoch 41/100
662/662 [==============================] - 2s 3ms/step - loss: 143.8550 - mse: 136.2464 - val_loss: 238.0962 - val_mse: 230.4865
Epoch 42/100
662/662 [==============================] - 2s 3ms/step - loss: 140.4258 - mse: 132.8163 - val_loss: 103.0920 - val_mse: 95.4828
Epoch 43/100
662/662 [==============================] - 2s 3ms/step - loss: 126.0178 - mse: 118.4073 - val_loss: 106.7823 - val_mse: 99.1694
Epoch 44/100
662/662 [==============================] - 2s 3ms/step - loss: 120.4682 - mse: 112.8540 - val_loss: 176.0248 - val_mse: 168.4082
Epoch 45/100
662/662 [==============================] - 2s 3ms/step - loss: 131.9019 - mse: 124.2833 - val_loss: 194.4661 - val_mse: 186.8451
Epoch 46/100
662/662 [==============================] - 2s 3ms/step - loss: 112.4006 - mse: 104.7775 - val_loss: 88.1086 - val_mse: 80.4838
Epoch 47/100
662/662 [==============================] - 2s 3ms/step - loss: 108.4361 - mse: 100.8089 - val_loss: 156.1286 - val_mse: 148.4997
Epoch 48/100
662/662 [==============================] - 2s 3ms/step - loss: 129.2126 - mse: 121.5826 - val_loss: 160.3470 - val_mse: 152.7151
Epoch 49/100
662/662 [==============================] - 2s 3ms/step - loss: 96.5157 - mse: 88.8832 - val_loss: 83.4677 - val_mse: 75.8335
Epoch 50/100
662/662 [==============================] - 2s 3ms/step - loss: 104.0960 - mse: 96.4595 - val_loss: 115.8624 - val_mse: 108.2236
Epoch 51/100
662/662 [==============================] - 2s 3ms/step - loss: 103.9950 - mse: 96.3543 - val_loss: 96.5157 - val_mse: 88.8730
Epoch 52/100
662/662 [==============================] - 2s 3ms/step - loss: 110.2892 - mse: 102.6449 - val_loss: 79.4264 - val_mse: 71.7808
Epoch 53/100
662/662 [==============================] - 2s 3ms/step - loss: 96.4508 - mse: 88.8035 - val_loss: 100.0389 - val_mse: 92.3898
Epoch 54/100
662/662 [==============================] - 2s 3ms/step - loss: 93.1556 - mse: 85.5042 - val_loss: 402.6776 - val_mse: 395.0251
Epoch 55/100
662/662 [==============================] - 2s 3ms/step - loss: 109.6097 - mse: 101.9547 - val_loss: 157.7052 - val_mse: 150.0488
Epoch 56/100
662/662 [==============================] - 2s 3ms/step - loss: 96.8001 - mse: 89.1417 - val_loss: 95.8780 - val_mse: 88.2173
Epoch 57/100
662/662 [==============================] - 2s 3ms/step - loss: 93.0554 - mse: 85.3929 - val_loss: 76.2172 - val_mse: 68.5541
Epoch 58/100
662/662 [==============================] - 2s 3ms/step - loss: 91.4991 - mse: 83.8344 - val_loss: 71.3347 - val_mse: 63.6681
Epoch 59/100
662/662 [==============================] - 2s 3ms/step - loss: 109.8334 - mse: 102.1658 - val_loss: 130.9901 - val_mse: 123.3204
Epoch 60/100
662/662 [==============================] - 2s 3ms/step - loss: 87.7392 - mse: 80.0689 - val_loss: 69.0914 - val_mse: 61.4202
Epoch 61/100
662/662 [==============================] - 2s 3ms/step - loss: 86.0295 - mse: 78.3577 - val_loss: 137.1095 - val_mse: 129.4364
Epoch 62/100
662/662 [==============================] - 2s 3ms/step - loss: 91.6068 - mse: 83.9328 - val_loss: 68.5680 - val_mse: 60.8936
Epoch 63/100
662/662 [==============================] - 2s 3ms/step - loss: 86.4056 - mse: 78.7302 - val_loss: 65.8676 - val_mse: 58.1911
Epoch 64/100
662/662 [==============================] - 2s 3ms/step - loss: 91.6171 - mse: 83.9403 - val_loss: 67.8274 - val_mse: 60.1502
Epoch 65/100
662/662 [==============================] - 2s 3ms/step - loss: 88.6728 - mse: 80.9952 - val_loss: 77.6899 - val_mse: 70.0111
Epoch 66/100
662/662 [==============================] - 2s 3ms/step - loss: 83.9364 - mse: 76.2580 - val_loss: 72.8364 - val_mse: 65.1573
Epoch 67/100
662/662 [==============================] - 2s 3ms/step - loss: 90.8665 - mse: 83.1870 - val_loss: 76.1039 - val_mse: 68.4234
Epoch 68/100
662/662 [==============================] - 2s 3ms/step - loss: 86.8273 - mse: 79.1474 - val_loss: 112.5748 - val_mse: 104.8947
Epoch 69/100
662/662 [==============================] - 2s 3ms/step - loss: 89.4757 - mse: 81.7952 - val_loss: 63.8292 - val_mse: 56.1476
Epoch 70/100
662/662 [==============================] - 2s 3ms/step - loss: 83.6806 - mse: 75.9991 - val_loss: 150.1401 - val_mse: 142.4585
Epoch 71/100
662/662 [==============================] - 2s 3ms/step - loss: 83.8634 - mse: 76.1812 - val_loss: 64.5143 - val_mse: 56.8313
Epoch 72/100
662/662 [==============================] - 2s 3ms/step - loss: 86.4715 - mse: 78.7868 - val_loss: 107.5888 - val_mse: 99.9037
Epoch 73/100
662/662 [==============================] - 2s 3ms/step - loss: 84.0807 - mse: 76.3943 - val_loss: 83.1718 - val_mse: 75.4857
Epoch 74/100
662/662 [==============================] - 2s 3ms/step - loss: 82.3913 - mse: 74.7021 - val_loss: 176.5017 - val_mse: 168.8111
Epoch 75/100
662/662 [==============================] - 2s 3ms/step - loss: 86.8000 - mse: 79.1083 - val_loss: 72.1226 - val_mse: 64.4287
Epoch 76/100
662/662 [==============================] - 2s 3ms/step - loss: 86.5047 - mse: 78.8102 - val_loss: 63.9386 - val_mse: 56.2433
Epoch 77/100
662/662 [==============================] - 2s 3ms/step - loss: 81.0029 - mse: 73.3064 - val_loss: 67.7209 - val_mse: 60.0233
Epoch 78/100
662/662 [==============================] - 2s 3ms/step - loss: 80.0911 - mse: 72.3917 - val_loss: 59.3782 - val_mse: 51.6779
Epoch 79/100
662/662 [==============================] - 2s 3ms/step - loss: 83.0124 - mse: 75.3110 - val_loss: 232.4343 - val_mse: 224.7319
Epoch 80/100
662/662 [==============================] - 2s 3ms/step - loss: 78.2458 - mse: 70.5423 - val_loss: 65.7999 - val_mse: 58.0953
Epoch 81/100
662/662 [==============================] - 2s 3ms/step - loss: 85.9389 - mse: 78.2333 - val_loss: 66.6091 - val_mse: 58.9020
Epoch 82/100
662/662 [==============================] - 2s 3ms/step - loss: 76.0563 - mse: 68.3487 - val_loss: 69.1832 - val_mse: 61.4747
Epoch 83/100
662/662 [==============================] - 2s 3ms/step - loss: 85.4043 - mse: 77.6949 - val_loss: 63.2030 - val_mse: 55.4930
Epoch 84/100
662/662 [==============================] - 2s 3ms/step - loss: 73.5612 - mse: 65.8489 - val_loss: 69.1200 - val_mse: 61.4062
Epoch 85/100
662/662 [==============================] - 2s 3ms/step - loss: 83.0800 - mse: 75.3660 - val_loss: 58.9890 - val_mse: 51.2741
Epoch 86/100
662/662 [==============================] - 2s 3ms/step - loss: 72.7144 - mse: 64.9987 - val_loss: 60.7145 - val_mse: 52.9978
Epoch 87/100
662/662 [==============================] - 2s 3ms/step - loss: 78.4185 - mse: 70.7007 - val_loss: 56.1925 - val_mse: 48.4729
Epoch 88/100
662/662 [==============================] - 2s 3ms/step - loss: 79.3547 - mse: 71.6343 - val_loss: 65.6896 - val_mse: 57.9678
Epoch 89/100
662/662 [==============================] - 2s 3ms/step - loss: 76.8642 - mse: 69.1408 - val_loss: 95.4288 - val_mse: 87.7053
Epoch 90/100
662/662 [==============================] - 2s 3ms/step - loss: 74.1398 - mse: 66.4144 - val_loss: 55.6624 - val_mse: 47.9354
Epoch 91/100
662/662 [==============================] - 2s 3ms/step - loss: 79.8749 - mse: 72.1469 - val_loss: 61.7292 - val_mse: 54.0004
Epoch 92/100
662/662 [==============================] - 2s 3ms/step - loss: 78.9912 - mse: 71.2609 - val_loss: 53.7035 - val_mse: 45.9720
Epoch 93/100
662/662 [==============================] - 2s 3ms/step - loss: 75.8728 - mse: 68.1391 - val_loss: 57.1908 - val_mse: 49.4551
Epoch 94/100
662/662 [==============================] - 2s 3ms/step - loss: 74.8501 - mse: 67.1128 - val_loss: 72.1238 - val_mse: 64.3837
Epoch 95/100
662/662 [==============================] - 2s 3ms/step - loss: 75.5248 - mse: 67.7846 - val_loss: 263.8614 - val_mse: 256.1207
Epoch 96/100
662/662 [==============================] - 2s 3ms/step - loss: 82.8836 - mse: 75.1415 - val_loss: 85.4384 - val_mse: 77.6948
Epoch 97/100
662/662 [==============================] - 2s 3ms/step - loss: 73.7890 - mse: 66.0442 - val_loss: 98.3216 - val_mse: 90.5759
Epoch 98/100
662/662 [==============================] - 2s 3ms/step - loss: 77.0843 - mse: 69.3374 - val_loss: 52.3031 - val_mse: 44.5554
Epoch 99/100
662/662 [==============================] - 2s 3ms/step - loss: 77.2531 - mse: 69.5041 - val_loss: 58.5692 - val_mse: 50.8194
Epoch 100/100
662/662 [==============================] - 2s 3ms/step - loss: 73.7785 - mse: 66.0277 - val_loss: 55.7968 - val_mse: 48.0450
<keras.callbacks.History at 0x7f9d279baeb0>
Below you will evaluate the performance of your model using the test data.
# test_lbl = np.array(test_ft.pop(label_name))
print("Model evaluation: \n")
model.evaluate(x=test_ft, y=test_lbl, batch_size=batch_size)
Model evaluation: 82/82 [==============================] - 0s 2ms/step - loss: 56.4856 - mse: 48.7338
[56.48563766479492, 48.733795166015625]
#Get a summary of your model
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
Features (DenseFeatures) multiple 0
Hidden1 (Dense) multiple 700
Hidden2 (Dense) multiple 1020
Hidden3 (Dense) multiple 210
Output (Dense) multiple 11
=================================================================
Total params: 1,941
Trainable params: 1,941
Non-trainable params: 0
_________________________________________________________________
Below we provide you with tables and figures for you to visualize your training results.
From TensorBoard, you can see the loss and mse curve of your training. Go to graph and under "Tag", select "keras". You can see your network. Note that you will see error under "Tag: Default". You can ignore the warning.
%tensorboard --logdir logs
Below, your trained-model is used to make prediction on the test set. Remember, test set is not used in training the model so it would give you a nice indication of how your model is doing.
.predict(): predicts the output values from features given.
predicted_labels: contains the values ($CO_{2}$) our model predicts.
After the predicted and actual values are obtained. We create a plot for you to visualize the results. The dots show the predicted values and the line shows the targeted values.
%%time
# Get the features from the test set
test_features = test_ft
# Get the actual CO2 output for the test set
actual_labels = test_lbl
# Make prediction on the test set
predicted_labels = model.predict(x=test_features).flatten()
# Define the graph
Figure1 = plt.figure(figsize=(5,5), dpi=100)
plt.xlabel('Actual Outputs [Vehicle CO\u2082]')
plt.ylabel('Predicted Outputs [Vehicle CO\u2082]')
plt.scatter(actual_labels, predicted_labels, s=15, c='Red', edgecolors='Yellow', label='Predicted Values')
# Take the output data from 2000 to 3000 as an instance to visualize
lims = [2000, 3000]
plt.xlim(lims)
plt.ylim(lims)
plt.plot(lims, lims, color='Green', label='Targeted Values')
plt.legend()
WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'dict'> input: {'vehicle_angle': <tf.Tensor 'IteratorGetNext:0' shape=(None,) dtype=float32>, 'vehicle_eclass': <tf.Tensor 'IteratorGetNext:1' shape=(None,) dtype=string>, 'vehicle_fuel': <tf.Tensor 'IteratorGetNext:2' shape=(None,) dtype=float32>, 'vehicle_noise': <tf.Tensor 'IteratorGetNext:3' shape=(None,) dtype=float32>, 'vehicle_pos': <tf.Tensor 'IteratorGetNext:4' shape=(None,) dtype=float32>, 'vehicle_route': <tf.Tensor 'IteratorGetNext:5' shape=(None,) dtype=string>, 'vehicle_speed': <tf.Tensor 'IteratorGetNext:6' shape=(None,) dtype=float32>, 'vehicle_type': <tf.Tensor 'IteratorGetNext:7' shape=(None,) dtype=string>, 'vehicle_waiting': <tf.Tensor 'IteratorGetNext:8' shape=(None,) dtype=float32>}
Consider rewriting this model with the Functional API.
CPU times: user 1.7 s, sys: 517 ms, total: 2.22 s
Wall time: 907 ms
<matplotlib.legend.Legend at 0x7f9d2b6e6580>
Below, the graph shows a Histogram of errors between predicted and actual values. If the error counts locate mostly around 0, the trained-model is pretty accurate.
error = actual_labels - predicted_labels
Figure2 = plt.figure(figsize=(8,3), dpi=100)
plt.hist(error, bins=50, color='Red', edgecolor='Green')
plt.xlabel('Prediction Error [Vehicle CO\u2082]')
plt.ylabel('Count')
Text(0, 0.5, 'Count')
Below, a table puts the actual and predicted values side by side. Html is used in this case.
from IPython.display import HTML, display
def display_table(data_x, data_y):
html = "<table>"
html += "<tr>"
html += "<td><h3>%s</h3><td>"%"Actual Vehicle CO\u2082"
html += "<td><h3>%s</h3><td>"%"Predicted Vehicle CO\u2082"
html += "</tr>"
for i in range(len(data_x)):
html += "<tr>"
html += "<td><h4>%s</h4><td>"%(int(data_x[i]))
html += "<td><h4>%s</h4><td>"%(int(data_y[i]))
html += "</tr>"
html += "</table>"
display(HTML(html))
display_table(actual_labels[0:100], predicted_labels[0:100])
Actual Vehicle CO₂ | Predicted Vehicle CO₂ | ||
5286 | 5284 | ||
0 | 0 | ||
2624 | 2625 | ||
8381 | 8373 | ||
9026 | 9019 | ||
16025 | 16039 | ||
0 | 0 | ||
0 | 0 | ||
3392 | 3398 | ||
0 | 0 | ||
5286 | 5287 | ||
2141 | 2146 | ||
0 | 0 | ||
6008 | 6003 | ||
0 | 0 | ||
7427 | 7421 | ||
2624 | 2624 | ||
7783 | 7785 | ||
0 | 0 | ||
0 | 0 | ||
4700 | 4702 | ||
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
3223 | 3237 | ||
5286 | 5282 | ||
10385 | 10355 | ||
0 | 0 | ||
7380 | 7390 | ||
13882 | 13891 | ||
45482 | 45460 | ||
5186 | 5189 | ||
0 | 0 | ||
0 | 0 | ||
2624 | 2627 | ||
18365 | 18371 | ||
5977 | 5971 | ||
0 | 0 | ||
16343 | 16365 | ||
0 | 0 | ||
0 | 0 | ||
2706 | 2700 | ||
0 | 0 | ||
6476 | 6463 | ||
9410 | 9408 | ||
7434 | 7445 | ||
0 | 0 | ||
6176 | 6164 | ||
7555 | 7562 | ||
0 | 0 | ||
18916 | 18917 | ||
0 | 0 | ||
0 | 0 | ||
7836 | 7840 | ||
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
2931 | 2933 | ||
3271 | 3283 | ||
0 | 0 | ||
5514 | 5522 | ||
0 | 0 | ||
0 | 0 | ||
3565 | 3560 | ||
12180 | 12192 | ||
3632 | 3631 | ||
0 | 0 | ||
8769 | 8772 | ||
0 | 0 | ||
5286 | 5290 | ||
0 | 0 | ||
10815 | 10819 | ||
7090 | 7091 | ||
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
29725 | 29729 | ||
0 | 0 | ||
0 | 0 | ||
3580 | 3582 | ||
6814 | 6816 | ||
2878 | 2885 | ||
6564 | 6562 | ||
5197 | 5192 | ||
5200 | 5213 | ||
0 | 0 | ||
5011 | 5000 | ||
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
3101 | 3097 | ||
2607 | 2605 | ||
7759 | 7772 | ||
4757 | 4771 | ||
0 | 0 | ||
0 | 0 |
Congradulation on finishing the lab. Please click on "File -> Print Preview" and a separate page should open. Press Cmd/Ctrl + p to print. Select "Save as PDF". Submit this .ipnyb Notebook file, the PDF, and loss graph screenshots to the link specified in the Google Doc.